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Abstract Female fig wasps differ phenotypically from conspecific males to the extent that often they cannot be associated with one 
another. Weighted gene co-expression network analysis (WGCNA) of the genome and transcriptomes of one such fig wasp, 
Ceratosolensolmsi, generated five expression modules, which were flagged as blue, turquoise, brown, green and yellow. These involved 
two female-biased expression modules and three pupa-biased expression modules, respectively. Gene ontologies indicated three 
functional enrichment gene sets in modules turquoise and yellow. Two functional enrichment gene sets that participate in cell cycle or 
have nucleotide binding activityclustered in turquoise module. The functionally enriched gene set in yellow module played roles in cell 


differentiation, especially in neuron morphogenesis. 
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Genes that co-participate in biological processes groups of co-expressed functional genes from 


generally display similar expression patterns. Often transcriptomes of male and female C. solmsi. 


such gene sets involve functional enrichments, which š 
z : 1 Materials and methods 
clustering methods can reveal. Thus, clustering of gene 


sets facilitates an understanding of the roles of genes 
play in a biological processes"!. 

Weighted gene co-expression network analysis 
(WGCNA) can cluster together genes that function in 
concert. One of the most direct methods for identifying 
gene co-expression networks "!, WGCNA has been 
successfully applied to the identification of functional 
enrichment modules in complex diseases®*. 

Fig wasps are obligate symbiosis of their fig 
hosts. Female and male fig wasps exhibit extreme 
morphological dimorphism ™. Recently, the genome 
and transcriptomes of Ceratosolensolmsi were 
published". Herein, we employ WGCNA to identify 


1.1 Differentially expressed genes 

Fig wasps has three developmental stages: larva, 
pupa and adult", Transcriptome data were accessed in 
the NCBI Short Read Archive(www.ncbi.nIm.nih.gov/ 
sra) under the accession No. SRP029703. The FPKM 
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method (fragments per kb per million reads) "" was 
used to calculate the expression of unigenes. Each 
gene expression (GE) value was log2-transformed 
from its FPKM value. To identify the differentially 
expressed genes, we used the distribution of the overall 
GE value to remove lowly expressed genes"”. In total, 
7 881 genes with GE>0.5 in at least four samples were 
obtained. Next, we used the coefficient of variation 
(CV) value to identify the differentially expressed 
genes'"!, Ultimately, we obtained 2 938 genes with 
CV <2. 
1.2 Screening and clustering of highly correlated 
genes 

WGCNA was used to describe the correlation 
patterns among 2 938 genes across the transcriptomic 
datasets and cluster the genes with highly correlated 
expression patterns. Firstly, the weights of each of the 
two genes were calculated using a soft threshold power 
of 12 because the R^2 was greater than 0.7 when 
applying this threshold". Secondly, two genes with a 
weight of greater than 0.476 were considered to 
possess highly correlated expression patterns for 
0.94412" Thirdly, 
genes were clustered when minModuleSize = 5 and 


candidate highly correlated 


mergeCutHeight = 0.05. Finally, each module of gene 
expression pattern was visualized using Cluster 3.0 and 
Treeview!" "7, 
1.3 Annotation and functional speculation 

Fig wasp genes were not well annotated. 
Therefore, the well-annotated genes of Drosophila 
melanogaster were used to imply the functions of fig 
wasp genes. Drosophila protein sequences were 
downloaded from Flybase. Our co-expressed genes 
were aligned to those in Flybase by using BLASTP"”. 
the highest 
sequences of fig wasp genes to those in Flybase. The 


Functional annotations had similar 
fly gene IDs, which were mapped by fig wasp genes, 
were put into Annotation, Visualization and Integrated 
Discovery (DAVID)"! to deduce enrichment levels of 


the latter. 


2 Results 


2.1 Highly correlated genes grouped into five 
co-expression modules 

Using a gene expression value (GE) of > 0.5"! 
and a coefficient of variation (CV) of < 2"), 2 938 
differently expressed genes were obtained. The 
weight-values between each two genes were calculated 
using WGCNA. For better screening of the highly 


correlated co-expression genes, 0.476 was taken as a 
co-expression threshold value. This resulted in 336 
genes being chosen. The Cytoscape v2.6.3! network 
divided 336 genes into four parts (Figure 1). The 
features of the network were show in Figure S1 ~ S4 
and Table S1. Interestingly, the 336 genes clustered 
into five modules using WGCNA clustering function 
(Figure 2). Depicted in colors, these consisted of 54 
(green), 62 (yellow), 63 (brown), 76 (blue) and 81 
(turquoise) genes. 

The turquoise and blue modules clustered 
together, as did green and brown. Yellow module did 
not cluster with any other module (Figure 1, 2). To 
confirm that co-expressed genes had similar levels of 
expression, we visualized them in heatmaps using 
Cluster 3.0 and Treeview "®™!, Genes clustered in 
brown, green and yellow modules were highly 
expressed in females, and genes in blue and turquoise 
module were expressed in pupa (Figure S5). 

2.2 Gene ontology 

In total, 336 co-expressed genes clustered into 
three co-expression networks and five modules. These 
results suggested that each module of genes should 
have unique, enriched functions. Our annotation of fig 
wasp gene functions by using BLASTP against the 
genome of Drosophila discovered 251 genes(Table S2). 
These genes were then analyzed for the functional 
enrichments using DAVID '. Three annotation 
clusters with enrichment scores of > 3 were chosen and 
we terms with Benjamini values of < 0.05 in each 
annotation cluster to be candidate functional terms. 

We used the fold change (c) to summarize the 
degree of functional enrichment by calculating formula 
(1) as follows: 


a/m 
caT PR (1) 


a: Number of enriched genes of a given module in 
an annotation function term. b: Number of all enriched 
genes in an annotation function term. m: Number of all 
genes in a given module. n: Number of all candidate 
genes (336). 

Gene sets with clear function enrichments had c 
values of greater than 2 and gene sets likely to have 
functional enrichments had c values between 1 and 2. 

GO results showed that three annotation-clusters 
involved biological processes and molecular functions 
(Table S3). Genes that played roles in cell cycle 
functions exhibited the greatest amount of enrichment 
and these belonged to annotation cluster 1, and 
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Fig. 1 Highly correlated genes’ co-expression network 
(a) Divided into blue and turquoise module (connected with Figure 2). (b) Divided into brown and yellow module. (c) Belong to yellow module. (d) Belong to brown module. 


* 86° AE HN Se Se ye Ba Prog. Biochem. Biophys. 2016; 43 (1) 


10 


0.8 4 


0.4 


0.2 4 


Module colors 


Blue Turquoise Green Brown Yellow 


Fig. 2 Highly correlation genes clustered into five expression modules 


Blue and turquoise module genes co-expressed together, green and brown module genes co-expressed together(connected with Figure 1). 


modules turquoise and blue. Annotation cluster 2 annotation cluster 3 was comprised of genes enriched 
contained genes enriched in nucleotide binding in morphogenesis. These genes tended to cluster into 
activity. They also exhibited the greatest levels of modules yellow and green (Figure 3). 

enrichment in modules turquoise and blue. Finally, 


3.0 
v 
g 2.5 
§ 2.0 4---4e4e4eq-q-d-qd-d-qd-q-d-aq-a- aoc ARR Ao PRR aici icici -r-i cir aie -- 
5 15 
z 10 
= 0.5 
0 
Cluster 1 Cluster 2 Cluster 3 
2 2 gagegage eeE EMM HM DM HM OME AE MAEM A Se we Y 
SRS eVP eV SSS sEEEE EE ES EES ER EC SE BEE BBE 
oO AS an an S 2? 2 2 22500000D o o E = 2 B 
$S ongs Sga Esaa asss nesses saseg EES gge gES 
Sa Jt aS = is SERB SESS SSS SSS SG PSHM PEP SE ea B 
2 Y g 8 B 37H PS h& hSwovwvweavweooe ®S L! AARAA aaa 
3 a 13 "| gQgEeeglPrerrsasa sae spe e ep kP eG PPP ASCH PCE = 
ay > £228 27 J 77°73 SSK SSSESeIU8AS Seger stg 
2 3 S38 ® Fxe2esesFRS SSS ZSEAES ESV“ EFS 
z o €450 228285323 288888 ses Be 5 #8 
O l Ea Savone a e225 22322 8 a8 5 2 3 
w Ba EEn žege ayl 2 —! al SEBS AS D EZ 
o z LS a A Oo = 2 > 2 3 o O 6 =| > f= Q 
aI Sof eegtsé E S.s a2 295 4 | 2 
A 5gs 3 5 = l & E 3p arg e s = 
E] foe ao: G) SO o 3 T 3 S — le 2 = 5 
£ Ze l om a <A zag `g pa 8 
5 = 8 5 oa Ssi a à B y 
Ss B a < 232920 a7 a 5 
5 a2 2 9 Ei 
S D 5 > = 
S a A BS 
a fo) a 
= E a 
S S 
g v 
=! È 
D 
© 5 
| 
Li 
D 
Q 
Biology process Molecular function Biology process 


Fig. 3 The enrichment of genes clustered in turquoise and yellow module 
Genes of annotation cluster 1 and annotation cluster 2 were enriched biology process and molecular function, respectively. However, genes of 


annotation cluster 3 were enriched biology process. 
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Genes in modules turquoise and blue were 
expressed preferentially in female fig wasps, but not in 
males (Figure S5). This discovery suggested that these 
genes might play important roles in the development of 
females. For example, CSO_000580, which was 
annotated as FBgn0000063, might have protein kinase 
activity" and play roles in female meiosis ””, female 
meiosis chromosome segregation °! and meiotic 
spindle assembly checkpoint™!. Thus, this gene might 
be involved in oogenesis. Another example, 
CSO_000814, which was annotated by FBgn0004859, 
might also play roles in oogenesis“. These results 
suggested that clustered highly expressed genes in 
adult female fig wasps special roles in adult female 
development. 

Genes clustering in modules yellow and green 
were highly expressed in the pupa stage of the fig 
wasp. Female and male pupa exhibited the same levels 
of expression. Thus, these genes might play roles in 
pupa morphogenesis. For example, CSO_010218, 
which was annotated by FBgn0030662"!, might be a 
component of the golgi cisterna membrane and be 
involved in glucuronosyltransferase activity; it may 
also function in the chondroitin sulfate biosynthetic 
process, Ecdysteroid UDP-glucosyltransferases were 
shown to play a role in the conjugation of ecdysteroids 
with glucose or galactose ™™. Similarly, CSO_004410 
(annotated by FBgn0086680) was found to be a RNA 
polymerase [| core promoter in the proximal region of 
sequence-specific DNA binding transcription factor 
activity involved in positive regulation of transcription. 
Further, CSO_004410 might play roles in motor 
neuron axon guidance ™!, regulation of dendrite 
morphogenesis™ and brain development". 


3 Discussion 


3.1 Highly co-expressed genes clustered into five 
expression-modules 

The fig wasp is a good organism for studying the 
co-expression of genes in males and females that 
exhibit 
obtained 2 938 differentially expressed genes from 
11 506 annotated genes in RNA-seq data. GE values 
were used to remove lowly expressed genes!” and CV 


extreme morphological dimorphism. We 


values were employed to identify differentially 
expressed genes |". To ascertain the most relevant 
WGCNA_ was used to highly 


co-expressed genes, those with a weight greater than 


genes, cluster 


0.476 "I, into five expression-modules. Genes in 


modules turquoise and blue were co-expressed, as 
were genes in module brown and green. Genes in 
modules turquoise and blue were highly expressed in 
female fig wasp only. Further, genes in module green, 
brown and yellow were expressed mainly in pupae, 
suggesting involvement in the development of fig 
wasps. Genes in module yellow did not cluster with 
genes in modules brown or green, suggesting unique 
functions of genes in each module. Five differentially 
expressed gene modules were obtained, and genes 
function of these modules might be different to each 
other. The finding was useful for the exploration of fig 
wasp gene functions. 
3.2 Modules with a low-fold change score are 
revealing 

Highly correlated genes can facilitate explorations 
into the roles genes play in divergent phenotypes. 
Different modules appear to have different functional 
enrichments. Modules with a low-fold change score 
might have unknown functional enrichments. For 
example, genes in modules blue and turquoise belong 
to the same co-expression network, but they have 
different functional enrichment scores and the same 
situation occurs in modules brown and green. These 
observations suggest that most genes functions await 
detailed annotation. 
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Fig. S1 Frequency distribution of characteristic path length 
This figure was output directly by NetworkAnalyzer in Cytoscape v2.6.3. The x-axis represents the shortest path length, which represents the expected 


distance between two connected nodes, and the y-axis represents frequency. 
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Fig. S2 Frequency distribution of shared neighbors 
This figure was output directly by NetworkAnalyzer in Cytoscape v2.6.3. The shared neighbors represent the connections between neighbors. The x-axis 


represents the number of shared neighbors, and the y-axis represents frequency. 
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Fig. S3 Topological coefficient of neighbors 
This figure was output directly by NetworkAnalyzer in Cytoscape v2.6.3. The topological coefficient is a relative measure of the extent to which a node 


shares neighbors with other nodes. The x-axis represents the number of neighbors, and the y-axis represents the topological coefficient. 
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Fig. S4 Node degree distribution 
This figure was output directly by NetworkAnalyzer in Cytoscape v2.6.3. The number of nodes (genes) plotted as a property of their degree (number of 


connections with other nodes) shows a power-law like distribution, which indicates a scale-free network topology. 
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Fig. S5 Heatmap for the expression of genes in five modules 


This figure was created using Cluster 3.0 and Treeview. The bar in the lower left shows that the redder colors indicate higher expression, and the more 


brilliant green colors represent lower expression. The x-axis represents four different developmental stages in the fig wasp (larva, 21-day pupa, 25-day 


pupa and adults), while the y-axis represents the different genes. 


2016; 43 (1) 


BAL, S: A) MeRARR 


Table S1 Simple parameters of the network 


Serial number Property Value 
1 Clustering coefficient 0.408” 
2 Connected components 4 
3 Network centralization 0.162” 
4 Network heterogeneity 1.310” 
5 Characteristic path length 3.462? 
6 Network radius 1? 

7 Network diameter 10? 

8 Avg. number of neighbors 8.214» 
9 Number of nodes 336 
10 Network density 0.025 
11 Shortest path 40306” 


Column | lists the number of simple parameters. Column 2 lists the names of the basic properties. Column 
3 lists the corresponding value of each property in the gene co-expression network. ! The parameter 


associated with scale-free distribution. ®? The parameter associated with shortest path length. » The 


parameter associated with the number of neighboring nodes. 


